Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn $\epsilon$-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound $\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ on the required number of realizations to learn these strategies with high probability, where $H$ is the length of the game, $A_{\mathcal{X}}$ and $B_{\mathcal{Y}}$ are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs $\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)$ plays without this requirement by progressively adapting the regularization to the observations.
translated by 谷歌翻译
这项工作提出了一种新的动力学的新运动学,该动力学与单个执行器有关,可以实现三方握力,也可以实现侧向握力。受三位生假体的启发,比多物质假体更简单,更健壮和便宜,这种新的运动学旨在提出可访问的假体(负担得起的,易于使用,易于使用,健壮,易于修复)。使用电缆代替刚性杆来传递上指和拇指的动作。本文详细介绍了方法和设计选择。总而言之,通过实验用户对原型的评估导致对结果的首次讨论。
translated by 谷歌翻译